information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...

, the limiting density of discrete points is an adjustment to the formula of

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American people, American mathematician, electrical engineering, electrical engineer, and cryptography, cryptographer known as a "father of information theory". As a 21-year-o ...

for

differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...

. It was formulated by

Edwin Thompson Jaynes Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statistic ...

to address defects in the initial definition of differential entropy.

Definition

Shannon originally wrote down the following formula for the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...

of a continuous distribution, known as

: :

h(X)=-\int p(x)\log p(x)\,dx.

Unlike Shannon's formula for the discrete entropy, however, this is not the result of any derivation (Shannon simply replaced the summation symbol in the discrete version with an integral), and it lacks many of the properties that make the discrete entropy a useful measure of uncertainty. In particular, it is not invariant under a

change of variables Change or Changing may refer to: Alteration * Impermanence, a difference in a state of affairs at different points in time * Menopause, also referred to as "the change", the permanent cessation of the menstrual period * Metamorphosis, or change, ...

and can become negative. In addition, it is not even dimensionally correct. Since

P(x)

would be dimensionless,

p(x)

must have units of

\frac

, which means that the argument to the logarithm is not dimensionless as required. Jaynes argued that the formula for the continuous entropy should be derived by taking the limit of increasingly dense discrete distributions. Suppose that we have a set of

N

discrete points

\

, such that in the limit

N \to \infty

their density approaches a function

m(x)

called the "invariant measure". :

\lim_\frac\,(\mboxa Jaynes derived from this the following formula for the continuous entropy, which he argued should be taken as the correct formula:
: \lim_ H_(X) = \log(N) - \int p(x)\log\frac\,dx. Typically, when this is written, the term \log(N) is omitted, as that would typically not be finite. So the actual common definition is

: H(X)=- \int p(x)\log\frac\,dx. Where it is unclear whether or not the \log(N) term should be omitted, one could write

: H_(X) \sim \log(N) + H(X) Notice that in Jaynes' formula, m(x) is a probability density. For any finite N that m(x) is a uniform density over the quantization of the continuous space that is used in the Riemann sum. In the limit, m(x) is the continuous limiting density of points in the quantization used to represent the continuous variable x .

Suppose one had a number format that took on N possible values, distributed as per m(x) . Then H_(X) (if N is large enough that the continuous approximation is valid) is the discrete entropy of the variable x in this encoding. This is equal to the average number of bits required to transmit this information, and is no more than \log(N) . Therefore, H(X) may be thought of as the amount of information gained by knowing that the variable x follows the distribution p(x), and is not uniformly distributed over the possible quantized values, as would be the case if it followed m(x) . H(X) is actually the (negative)

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...

from

m(x)

p(x)

, which is thought of as the information gained by learning that a variable previously thought to be distributed as

m(x)

is actually distributed as

p(x)

. Jaynes' continuous entropy formula has the property of being invariant under a change of variables, provided that

m(x)

and

p(x)

are transformed in the same way. (This motivates the name "invariant measure" for ''m''.) This solves many of the difficulties that come from applying Shannon's continuous entropy formula. Jaynes himself dropped the

\log(N)

term as it was not relevant to his work (maximum entropy distributions), and it is somewhat awkward to have an infinite term in the calculation. Unfortunately, this cannot be helped if the quantization is made arbitrarily fine, as would be the case in the continuous limit. Note that

H(X)

as defined here (without the

\log(N)

term) would always be non-positive, because a KL divergence would always be non-negative. If it is the case that

m(x)

is constant over some interval of size

r

, and

p(x)

is essentially zero outside that interval, then the limiting density of discrete points (LDDP) is closely related to the differential entropy

h(X)

H_(X) \approx \log(N) - \log(r) + h(X)

Definition

References

Further reading